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Abstract 


This  paper  presents  a  simple  unifying  framework  for  a  wide  class  of  conjugate  directions  al¬ 
gorithms  whose  iterates  minimize  some  quadratic  functional  over  a  subspace.  Our  approach  is 
motivated  by  its  advantages  for  nonlinear  minimization,  but  the  purpose  of  this  paper  is  to 
present  the  greatly  simplified  convergence  analysis  that  results  for  the  linear  case. 
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GENERALIZED  CONJUGATE  DIRECTIONS 


Introduction 

We  present  a  simple  unifying  framework  for  a  class  of  conjugate  directions  algorithms  for 
the  solution  of  the  linear  system 

Ax  =  b  ,  where  A  G  !Rn  ^  “  is  large,  sparse,  and  nonsingular. 

The  class  under  consideration  consists  of  algorithms  that  minimize  an  error  functional  over 
a  subspace  or  affine  space  at  each  step.  This  class  includes  the  standard  and  the  preconditioned 
conjugate  gradient  algorithms,  the  conjugate  residual  algorithm,  Craig’s  method,  Elman’s  general¬ 
ized  conjugate  residual  algorithm,  Vinsome’s  ORTHOMIN  algorithm,  the  ORTHODIR  algorithm 
of  Young  and  Jea,  the  GMRES  algorithm  of  Saad  and  Schultz,  and  the  truncated  algorithms 
ORTHOMIN(fc)  and  ORTIIODIR(« ).  The  main  point  of  this  paper  is  that  our  approach  leads  to 
a  single  simple  geometric  theorem  giving  the  standard  convergence  results  for  all  the  untruncated 
methods  without  the  usual  clutter  of  lemmas.  Another  theorem,  which  we  give  without  proof 
because  the  proof  is  so  similiar,  suffices  for  the  truncated  methods.  Furthermore,  this  paper 
should  help  clear  up  the  common  misconceptions  about  the  relationship  between  these  methods 
and  the  Krylov  subspace. 

The  approach  we  have  taken  in  formulating  the  Generalized  Conjugate  Directions  (GCD) 
algorithm  is  based  on  an  idea  suggested  to  us  by  Peter  Huber,  who  reported  finding  it  useful  in 
practice  for  nonlinear  problems.  It  is  implicit  in  work  by  others  as  well,  especially  Miele  and  Can¬ 
trell  (1969),  Cantrell  (1969),  Cragg  and  Levy  (1969),  and  Nazareth  (1984).  The  idea  is  simple;  its 
key  feature  is  to  take  the  inverse  point  of  view  to  the  usual  one  of  generating  conjugate  directions 
and  then  minimizing  in  the  last  direction  generated. 

Our  formulation  is  similar  in  spirit  to  Axelsson’s  generalized  conjugate  gradient  algorithm, 
but  there  is  an  important  difference:  we  have  used  a  basis  for  the  subspace  over  which  minimiza¬ 
tion  must  be  carried  out  at  each  step  that  allows  us  to  explicitly  solve  the  minimization  problem 
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for  the  general  case.  Our  formulation  facilitates  analysis  of  domains  of  convergence  and  the 
number  of  previous  directions  required  to  be  saved  for  these  algorithms.  We  show  a  q-linear  con¬ 
vergence  rate  under  a  very  mild  hypothesis  for  the  class  of  algorithms. 

The  conjugate  directions  formulation  we  use  for  solving  a  linear  system  is  also  applicable  to 
the  solution  by  conjugate  directions  of  equality  constrained  quadratic  programming  problems,  and 
we  discuss  this  adaptation  of  the  GCD  algorithm. 

The  Generalized  Conjugate  Directions  Algorithm 

Our  GCD  algorithm  is  stated  as  a  means  of  finding  the  minimizer  of  the  quadratic 

q{x)  =  -i-a :THx~hTx  , 

where  we  assume  H  is  symmetric  positive  definite.  Thus,  g(ar)  has  a  unique  minimizer,  and 
finding  it  is  equivalent  to  finding  the  zero  of 

Vq(x)  =  Hx -h  . 

To  apply  the  algorithm  to  the  solution  of  Ax  —  b ,  one  may  use  H  —  A,  h  —  b  if  A  is  symmetric 
positive  definite,  or  H  =  ZA,  h  =  Zb ,  where  Z  is  nonsingular  and  ZA  is  symmetric  positive 
definite. 

We  use  the  notation 

n  =  =  h-Hxt 

and 

rk  =  b -Axk  . 

The  minimizer  of  q{x)  is 

x*  =  H-'h  , 

q{x*)  =  ~hTH~1h  . 


and  the  minimum  value  of  g(x)  is 
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We  define  ||  x  -x*  ||  #  s  (a?  -  x‘)T  H{x  -x'),  and  note  that 

(x  - x‘)T H(x  -  x *)  —  xT  Hx -2x7  Hx*  +  x*rHx*  —  2[g(a:)-g(ar*)] , 
so  that  minimizing  q(x)  is  equivalent  to  minimizing 

The  GCD  algorithm  produces  a  sequence  pj,p2,  •  •  •  of  linearly  independent,  mutually  H- 
conjugate  (p7Hp}-  —  0,  i  directions.  Conceptually,  we  do  not  choose  these  directions. 

Rather,  at  each  step  of  the  algorithm,  we  choose  a  direction  dk  in  any  manner  whatever,  requiring 
only  that  it  not  be  orthogonal  to  the  gradient  of  q  at  z*_j.  We  determine  the  next  iterate  xk  as 
the  minimizer  of  q(x)  in  ap  {plf pk_lt  dk},  the  subspace  spanned  by  {pi, ...,  p*_i,  dk  },  and  define 
P*  ==  xk  ~  xk-i-  Clearly,  the  choice  of  dk ,  which  we  now  discard,  determines  p*. 

For  ease  of  exposition,  we  use  the  initial  guess  x0  =  0.  This  is  not  restrictive.  If  a  better 
approximate  solution,  say  x,  is  known,  the  problem  Ax  =  b  with  initial  guess  x  can  be  solved  by 
solving  A(x -x )  =  b -Ax  with  initial  guess  0.  It  will  be  useful  to  let  argminq{x)  denote  the 

minimizer  of  q(x)  on  5.  Our  GCD  algorithm  is  as  follows. 

Generalized  Conjugate  Directions  Algorithm 

*o  =  0,  r  o  =  h  ,  k  =  1 . 
while  r*_!  0 

io 

get  dk  such  that  dk  ^  0 

**  =  argmin  q{x) 

*€»p{  Pj . Pt- vit) 

?*  =  **-**_! 

r*  =  rk_x-Hpk 

k  =  k  +  1 


end  do 
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The  GCD  algorithm  with  choices  of  H,  h,  and  dk  given  in  columns  3-5  in  the  following 
table  is  equivalent  to  the  algorithms  named  in  column  1. 


Algorithm 

Assumptions 

H 

h 

4 

CG 

A  spd 

A 

b 

»Vi 

CG  applied  to 
normal  equations 

A  nonsingular 

A  T  A 

ATb 

rk-i 

Craig’s  Method 

A  nonsingular 

AA  T 

b 

»•*- 1 

CR 

A  spd 

Ar  A 

ATb 

r*-i 

PCG 

A  spd 

A 

b 

M'Vt-j 

GCR 

A+A  T  pd 

AtA 

ATb 

nt-i 

ORTHOMIN 

A+At  pd 

A  T  A 

ATb 

r*-i 

ORTHODIR 

use  Z  such  that 
Z+ZT  is  pd 
and  ZA  is  spd 

ZA 

Zb 

di=b\  for  k>2, 
4=Ap*_  i 

GMRES 

A  nonsingular 

ata 

AT  b 

CG  denotes  the  standard  conjugate  gradient  algorithm,  and  CR  denotes  the  conjugate  resi¬ 
dual  algorithm.  For  the  solution  of  the  linear  system  Ax  =  b  using  Craig’s  method,  put  x  —  ATy 
and  solve  AATy  =  b.  The  computation  can  be  arranged  so  that  iterates  in  y  are  not  actually 
computed.  PCG  denotes  the  preconditioned  conjugate  gradient  algorithm.  To  solve  the  linear 
system  Ax  =  6  using  the  preconditioned  conjugate  gradient  algorithm,  one  finds  a  nonsingular 
symmetric  matrix  C  1  such  that  C  iAC~1  is  better  conditioned  than  A,  and  transforms  Ax  =  b  to 
(G  lAC~l)(Cx)=  C~lb .  The  matrix  M  that  appears  in  the  choice  of  dk  for  PCG  is  defined  to  be 
C2,  and  is  called  the  preconditioner.  Vinsome’s  ORTHOMIN  using  all  previous  directions  is 
equivalent  to  Elman’s  generalized  conjugate  residual  algorithm,  denoted  GCR.  The  ORTHODIR 
algorithm  as  given  by  Young  and  Jea  is  stated  as  requiring  only  that  Z  be  nonsingular  and 
(ZA)  +  (ZA)T  positive  definite,  but  ORTHODIR  is  included  in  the  class  of  algorithms  under 
present  consideration  (those  algorithms  that  minimize  the  error  functional  1 1  x  -  x  *  1 1  H  over  a  sub¬ 
space  or  affine  space  at  each  step)  only  if  ZA  is  symmetric  positive  definite.  Young  and  Jea 
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require  the  symmetric  part  of  Z  positive  definite  in  order  to  ensure  xk  ^  xk_i  •  ^Ve  have  included 
this  condition  here,  and  will  first  discuss  the  equivalence  of  ORTHODIR  and  GCD  when  this  con¬ 
dition  is  satisfied.  Later,  we  show  a  way  of  viewing  the  ORTHODIR  algorithm  in  our  framework 
when  Z  is  just  nonsingular.  The  GMRES  algorithm  of  Saad  and  Schultz  does  not  compute 
iterates  x k  to  approximate  x  .  Instead,  it  builds  an  orthonormal  basis  { i>  j, . . . ,  vm  }  for  the  Krylov 
space  K(b  ,A ,  n)  =  sp  {6  ,Ab ,  ...,j4n_16  }  and,  using  this  basis,  determines  the  solution  to  Ax  —  b 
as  the  minimizer  of  ||j4z-6  ||2  over  K(b  ,A ,  n).  The  vectors  {i>,}  are  determined  by  Arnoldi’s 
method: 


v,  = 


I  b  II  ’ 


for  j  =  1, m- 1 


»/+i  =  Avj  -  f]  vfAvj  v{ 

i=i 

)  ||  A  |  |  * 

II  +  1  ll 

The  truncated  algorithms  ORTHOMIN(i)  and  ORTHODIR(s)  are  equivalent  to  GCD(m),  which 
we  define  later,  with  the  same  choices  of  H,  h ,  and  dk  shown  for  ORTHOMIN  and  ORTHODIR, 
respectively. 


Here  is  the  first  of  our  two  main  results.  Notice  that  the  Krylov  subspace  is  totally 
separated  from  the  convergence  analysis  for  the  general  method.  Of  course,  a  choice  of  dk  that 
implies  minimization  on  a  Krylov  subspace  may  greatly  reduce  the  computational  complexity  of 
the  associated  method,  as  we  shall  show.  Furthermore,  the  Krylov  subspace  is  essential  to  the 
Chebyshev  polynomial  error  analysis  usually  associated  with  such  methods. 

Theorem  1:  If  H  is  symmetric  positive  definite,  then  the  following  statements  are  true  about  the 
GCD  algorithm: 

(i)  The  algorithm  terminates  after  no  more  than  n  steps,  and  terminates  if  and  only 

if  xk_!  =  x‘. 
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(ii)  xk  minimizes  q(x)  on  «p  {pi, ...,  p*}. 

(iii)  Every  pk  generated  by  the  algorithm  is  nonzero. 

(iv)  f*rpy=0,  1<j  <  k 

(v)  Pi%=o,  tyy 

(vi)  dim  sp  {pi,...,pk}  =  k 

(vii)  P?Hpi  =  p?h  for  all  i 

If  in  addition  dk  Esp  {dl}  Bpi, Bpk_x}  for  some  matrix  B,  then 

(viii)  sp  {pi,  ...,pk}  =K(duB ,k),  so  that  xk  minimizes  q(x)  on  K(dl,B,k). 

(ix)  rfBpi  =  0,  1  <  «  <  k-l. 

Proof: 

The  algorithm  can  terminate  only  because  some  7k_x  =  0.  If  0  =  7k_i  =  -Vq(xk_l),  then  xk_i 
is  a  critical  point  for  q.  Since  \72q(xk_l)  —  H  is  positive  definite,  this  is  necessary  and  sufficient 
for  xk_1—xt.  We  shall  prove  that  the  algorithm  terminates  after  not  more  than  n  steps  by 
proving  (ii)  and  (vi)  for  k  =  n,  because  then  by  (vi),  R"  =  sp  {p1;  ...,p*}  and  xn  =  x'  by  (ii). 

Let  us  proceed  by  induction.  For  k  =  1,  if  F0(=A)  =  0,  we  are  finished.  Otherwise,  we 
choose  dl  such  that  dfr, O^0.  We  can  certainly  do  this  by  taking  dx=  F0,  for  example.  Since 
dfro^O,  sp{d j}  contains  a  descent  direction  for  q  from  x0.  Thus, 

P i  =  Xi-x0  ^  0, 

which  anchors  the  induction  for  (iii)  and  (vi),  and 

«p{pi)  =  sp{dx}, 

which  anchors  (ii).  Since  X\  minimizes  q  on  «p{pi),  «p{pi}  cannot  contain  a  descent  direction 
for  q  from  Z[.  Thus 

0  =  -Vq(x1)TpI  =  rfp  j, 

anchoring  (iv),  and  this  is 


=  (A -Hx1)Tpl  =  [h-Hpi)Tpi, 
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which  anchors  (vii).  Also, 

®p{Pi}  =  =  K{du  B,l) 

for  any  B,  which  anchors  (viii).  Statements  (v)  and  (ix)  are  vacuously  true. 


Now  assume  that  (i)-(vii)  hold  for  1  <  /  <  k— 1.  We  have  already  taken  care  of  the  case 
when  the  algorithm  terminates  because  rk_x  =  0,  so  we  assume  ?*_!  ^  0.  Then  we  can  choose  dk , 
and  ep  {dk}  contains  a  descent  direction  for  q  from  xk_i.  Thus 


0  7^  *k  -  **- 1  =  Pk  ,  (iii)- 

Since  xk_x  E  sp  {pi, p*-i),  and  xk  6  sp  {px, ...,  pk_x,  dk),  we  have  dk  Esp  {pu  ...,pk}.  By  the 
induction  hypothesis,  rk_x  py  =  0  for  1  <  j  <  jfc-1,  so  dk  $  sp  {px, ...,  Hence, 

»p{pi,-,Pk}  =  «p{p . . Pk-i,dk),  (ii) 

and 

dim  sp{px,...,pk}  =  dim  sp{px,...,pk_x,dk} 

=  dim  «?{?!,...,?*_!}+ 1 
—  k-l  +  l  =  k,  (vi) . 

Since  xk  minimizes  q  on  sp{px,...,pk}, 


0  —  ^ q {xk)T Pi  =  -rfpi  for  1  <  i  <  k,  (iv) . 
*  * 

Note  that  xk  =  E  Pj,  so  that  rk  =  h  -  E  Hp}- .  Thus  for  i  <  k, 

y- 1  1 


Pi7’*'*  =  pA 


E  ViHP) 

y=i 


By  the  induction  hypothesis,  this  is 


=  pA  -  p.ryfp.  -  p?Hpk 

=  -  P>r^P*  ,  (v)  • 


Furthermore, 

k 

0  =  Pk  rk  =  p*A  -  E  PkHPj  =  Pkh  -PkHPk  ,  (vii)  • 
y=i 

Now  assume  (viii)  holds  for  1  <  j  <  k-l,  and  that  dk  E  sp{duBpx,  ...,Bpk_x}.  There  exist  a  and 
Pj,  1  <  /  <  Ar-1,  such  that 
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dk  =  «<*1  +  E  PiBPi  —  +  E  £yBPy]  +0k-iBpk-i  ■ 

y= i  l  y=i  i 

By  the  induction  hypothesis,  for  1  <  t  <  k-1,  p{EK(duB,k- 1),  so  that  the  first  term 
€K(duB,k-l)  and  the  second  term  EK{dltB,k)\  thus  dk  EK(duB ,  k).  This,  together  with 
«p{Pi>-,P*}  =  «p{Pi,-,P*-i,<4},  gives 

«p{Pi,..,P*}  =  K{dhB,k),  (viii) . 

For  (ix),  observe  that  by  (iv)  and  (viii),  rk  is  orthogonal  to  K(duB,k).  For  i<*-1, 
Pi  EK(dh  B ,  k-l),  so  that  Bp,-  E  K{di,B,  k)  .  • 

At  each  step  of  the  algorithm,  xk  is  to  be  determined  as 

xk  —  argmin  q(x)- 

*€»p{p, . pt_,,  4k} 

Let  Pk  —  [p1( ...» Pk-\,dk},  x=Pkc,  and 

q(c)  =  q(Pkc)  =  \eT(PkTHPk)c-hTPkc  . 

Then  Vj(c)  =  Pk  HPkc  -Pkh.  Since  H  is  positive  definite  and  Pk  has  full  column  rank,  PkHPk 
is  positive  definite.  Hence,  xk  can  be  found  by  solving  the  linear  system  PkHPkc  =Pkh  for  c, 
and  setting  xk—Pkc. 

The  matrix  of  the  system  of  linear  equations  that  determines  xk  and  hence  pk  is  symmetric 
with  an  arrowhead  structure  in  the  last  row  and  column  by  part  (v)  of  Theorem  1.  All  the  stan¬ 
dard  recursions  for  pk  in  each  algorithm  follow  in  an  insightful  and  simple  way  from  the  general 
solution  of  the  system.  The  way  some  algorithms  under  some  assumptions  can  be  seen  to  need 
only  pk_ i  and  dk  to  generate  pk  is  that  the  last  row  and  column  are  zero  in  all  but  the  last  two 
elements. 

By  (v)  and  (vii)  of  Theorem  1,  the  linear  system  PkHPkc  =  Pkh  is 
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PiHpi 

dkHpi  ' 

Cl 

pftfpl 

P  2  dip  2 

dkTHP2 

c2 

P  2  Up  2 

Pk-iHPk-i  dkHPk- i 

Cjfc- 1 

Pk-lddpk-i 

dkTHPl  dkHp2  . 

•  dkHPk_i  dkHdk 

c* 

dkh 

The  solution  to  the  linear  system  is 


,  pfHdk 

1  -  ak  —= - 

PiHpi 

P%Hdk 


l -ak 


P2HPi 


,  plxHdk 

^  ak  X  jj 

Pk-iHPk-i 

«* 


,  where  ak  — 


dkh  ~  £  dkHPj 
iTuj  {dkHPj)2 


dkTHdk  -  £ 


y==i  PjHpj 


The  numerator  of  ak  is  equal  to  dkrk_u  and  let  us  define,  for  1  <  j  <  Jfc-1, 


Then 


4*>  = 


-dlHPj 

pjHpj 


dkrk- 1 
dkH{dk  +  £ 


and 


xk  =  PkC  =  £  Pj  +  ®*K  +  £  fflPj) 
i= i  y=i 

=  **_i  +  a*(d*  +  X]  P\k^Pj)  ■ 

;'=  l 


When  dk  is  chosen  so  that  dkHp ,•  =  0  for  1  <  t  <  £-2,  the  solution  has  the  shorter  form 


/?*  =  4*1 


-dkHPk-i 

Pk-\Hpk-\  ’ 


a* 


dk  rk-i 

dkTH(dk+^kPk_l)  ’ 


and 
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**  =  **-i  +  <Xk{dk  +PkPk- 1). 

The  standard  conjugate  gradient  algorithm,  conjugate  gradient  algorithm  applied  to  the 
normal  equations,  conjugate  residual  algorithm,  preconditioned  conjugate  gradient  algorithm,  and 
Craig’s  method  all  choose  dk  so  that  only  pk_i  and  dk  are  needed  to  generate  pk  and  xk . 

For  CG  and  CR, 

*-i 

dk  =  =  b  =  b~Y,Apj, 

S=  i 

so  that  the  matrix  B  that  appears  in  the  additional  hypothesis  of  Theorem  1  is  A .  For  CG 

dfHpi  =  r^Api  =  r^Api , 

and  for  CR 

dkHpi  =  r?_lATApi  =  rk_xApi , 
since  A  is  symmetric.  Thus,  for  both  of  these  algorithms, 

dfHpi  =0  for  1  <  i  <  k- 2, 

by  part  (ix)  of  Theorem  1.  For  CG  applied  to  the  normal  equations,  which  we  henceforth  refer  to 
as  CGNE, 

*-i 


dk  =  r*_i  =  h -Hxk- i  =  Hpj  » 


and  for  Craig’s  method, 


dk  =  rk_ i  =  b -AATyk_1  =  h-^HPj , 

i= i 

so  that  the  matrix  B  is  H  for  each  of  these  algorithms,  and 

dkHpf  =  7k-\Hpi  =0  for  1  <  t  <  k-2 . 

For  PCG, 


dk  -JfV, 

and  B  is  M"XA .  Since  M  is  symmetric, 

dkHpi  =  (rk_iM~l)Api  =  7k_i  ( M~lA)p{  =0  for  1  <  i  <  k-2  . 
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GCR  (ORTHOMIN)  corresponds  to  the  same  choices  of  H,  h,  and  dk  as  CR,  but  since  this 
algorithm  is  applied  when  A  is  not  symmetric,  all  of  the  previous  directions  contribute  to  the  cal¬ 
culation  and  must  be  saved. 

For  ORTHODIR,  B  is  A,  and 

dfHpi  =  pk_yA  T  ZApi . 

Young  and  Jea  refer  to  choosing  Z  such  that  Z  (as  well  as  ZA )  is  symmetric  positive  definite  as 
the  “symmetrizable”  case.  If  Z  is  symmetric,  the  above  is 

=  (rk-2-rk_1)T APi , 

so  that 

d?Hpt  =0  for  1  <  t  <  it -3 , 

and  only  the  last  two  previous  directions  must  be  kept.  If  Z  is  not  symmetric,  all  previous  direc¬ 
tions  contribute  to  the  determination  of  the  iterates. 

We  now  verify  convergence  of  these  algorithms.  For  convergence  of  GCD,  we  require  only 
that  0  whenever  r*_i^0.  This  condition  is  easily  seen  to  be  satisfied  for  CG,  CGNE, 

Craig’s  method,  CR,  PCG,  GCR  (ORTHOMIN),  GMRES,  and  ORTHODIR,  under  the  restric¬ 
tions  we  have  shown  for  the  applicability  of  each  algorithm: 


CG 

4rF*_  i  =  rk_irk_l 

CGNE 

dk^k-i  =  ri-iF*.! 

Craig’s  method 

dkrk- 1  =  r*V*-i 

CR 

4rF*-i  =  rk-i^rk_i 

PCG 

dk^k-i  =  rk~\M~lrk_x 

GCR,  ORTHOMIN 

j T—  T  a  T  f  A+AT  ) 

4  rk- 1  =  rk-iArk_i  =  rk_\  - - - 1 

For  GMRES,  we 

cannot  ensure  convergence  for  general  nonsingular  A  by  selecting  dk 

but  we  can  guarantee  this  convergence  by  a  slightly  more  judicious  choice  of  dk,  as  we  shall  dis¬ 
cuss  momentarily.  The  choice  dk  —  vk  gives 
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and 


sP{Pi,  -,Pk}  =  sp{du...,dk)  =  «p{v1;. ,.,vk}  =  K(b,A,k) 


rh  dk  = 


-T  *_1 

r*-i  Avk_x-  X]  vfAv^Vf 

i—i 


Vk 


rh-iAv,,.! 

II4II 

*-i 

rk-iA  E  C<P< 
_ 1  =  1 

\\h\\ 


for  some  scalars  { c,- }  with  ck_ 

(because  vk_x$  ap  {pu  Pk_2}) 


C*-l  —T  !  \ 

Trr7rr*-i(r*-2->,*-1) 

II  II 


c*-l  —T 

'  rk-lrk-l 


Vk 


-ck- 1 

ll«*ll 


rk-\Ark_x . 


Hence,  if  the  symmetric  part  of  A  is  indefinite,  we  may  have  rk_ xvk  =0  at  some  step  before  the 
minimizer  has  been  found,  so  that  vk  will  not  be  a  suitable  choice  for  dk.  If  this  should  occur, 
however,  we  may  select  dk  and  subsequent  directions  dk+l, ...,  dm  from  vm }  in  an  order 

that  gives  rk_xdk  ^ 0  at  each  step,  until  a:*  is  found.  If  Tk-X »,•  =  0  for  all  remaining  j ,  k  <  i  <  m , 
then  xk_i  minimizes  g(x)  on  K(b,  A,  n),  and  we  are  finished.  We  have  the  latitude  of  using  the 
direction  vectors  {»,•}  in  any  order  in  an  algorithm  equivalent  to  GMRES,  since  the  iterates  { a*,- } 
are  not  actually  computed  in  the  GMRES  algorithm.  The  point  here  is  that  this  allows  our  con¬ 
vergence  analysis  to  apply  to  GMRES. 
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For  ORTHODIR,  we  are  assured  of  convergence  using  the  easily  computed  choice  of  dk 
shown  in  the  table  if  the  symmetric  part  of  Z  is  positive  definite  (or  negative  definite).  If  the 
symmetric  part  of  Z  is  indefinite,  this  choice  of  dk  will  not  ensure  descent  at  each  step.  In  that 
case,  we  can  guarantee  convergence  by  a  strategy  similar  to  that  employed  for  GMRES.  For 
ORTHODIR,  using  dx  =  b  and  dk  =  Apk_x  for  k >2,  we  have 

djr0=  bTZb, 

and  for  k  >  2, 

d? Ft-i  =  r’k-xApk-i 

~  r’k-l  (r*-2“  rk- 1)  • 

Note  that  since  F*_,  is  orthogonal  to  K(b ,A,k-l), 

k—2 

ri-i r*_2  =  r'kii  (6  -Axk_2)  =  rk_x(b  -  J]  APl) 

i=i 

so 

dk  rk- 1 - **_ir*_i 

=  -  rk~\Zrk~i . 

Thus,  if  the  symmetric  part  of  Z  is  positive  definite  (or  negative  definite),  dk7k_x  is  always 
nonzero  whenever  rk_ j  is  nonzero,  and  ORTHODIR  converges.  Now,  if  the  symmetric  part  of  Z  is 
indefinite,  ORTHODIR  is  not  guaranteed  to  give  descent  at  every  step,  but  is  still  convergent. 
ORTHODIR  generates  a  sequence  of  vectors  {?,■}  that  are  mutually  ZA -conjugate  (when  the  sym¬ 
metric  part  of  ZA  is  positive  definite)  and  that  form  a  basis  for  the  Krylov  space  K(b,A,n)  . 
These  vectors  are  determined  as 

<lo=  r0=  b 

and  for  k  =  1,  .  .  .  ,  t 

9*  =  Aqk-i  +  Yj  > 
y=o 

where  t+1  is  the  dimension  of  the  Krylov  space  K(b,A,n)  ,  and  for  0  <  j  <  k- 1  , 

h(k)  _  _  <l] Z A2 Qk-\ 

1  qfZAqj 
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The  ORTHODIR  algorithm  using  an  auxiliary  matrix  Z  that  is  only  required  to  be  nonsingular 
can  be  viewed  as  a  GCD  algorithm  in  the  following  way.  The  distinct  iterates  of  ORTHODIR  are 
produced  by  GCD  with  the  choice  dk  €  {g0,  •  •  •  ,qt}  ,  with  dk  =  q ^  provided  ^  0.  If 

Qk-irk-\  =  ^en  xk_i  minimizes  g(z)  on  sp{q0,  •  •  •  ,qk_ j}  =  K(b,A,k)  ,  and  this  step,  which 
would  give  xk  =  z*_i  in  the  ORTHODIR  algorithm,  is  skipped  in  GCD.  Thus,  dk  is  the  first 
occurring  element  of  {jn,  '  '  '  ,qt}  that  is  not  orthogonal  to  the  gradient  of  q  at  xk_i  . 

The  Truncated  Generalized  Conjugate  Directions  Algorithm 

An  alternative  to  choosing  dk  so  that  only  a  fixed  number  of  previous  directions  are  needed 
in  the  calculations,  while  still  avoiding  the  increasing  storage  and  work  at  each  iteration  associ¬ 
ated  with  keeping  all  of  the  previous  directions,  is  to  limit  to  m  the  number  of  previous  directions 
saved,  and  at  each  step  to  minimize  q(x)  over  an  affine  space  that  is  a  translate  of 
8P  {Pr>  Pk-it  dk  },  where  r=  max(&-m,  1).  This  results  in  the  truncated  version  of  the  General¬ 
ized  Conjugate  Directions  algorithm. 

Generalized  Conjugate  Directions  (m)  Algorithm 
*o  =  0 ,  F0  =  h  ,  k  =  1 . 
while  r*_t  yZ  0 
do 

get  dk  such  that  dk  ?*_!  yA  0 

r  =  max(fc-m,  I) 

xk  —  ar  grain  q(x) 

*€{*r_i +>p{pT,  .  .  .  ,  Pt-j.rf*}} 

Pk  =  xk-  xk_x 

7k  =  7k-\-Hpk 

k  =  k  +  1 


end  do 
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Analogous  to  Theorem  1,  we  have  our  second  main  theorem. 

Theorem  2:  If  H  is  symmetric  positive  definite,  then  the  following  statements  are  true  about  the 
GCD(m)  algorithm  (Let  r=  max(fc-m,  1))  : 


(i) 

The  algorithm  terminates  if  and  only  if  xk_i  =  x 

(ii) 

xk  minimizes 

q(x)  on  {xrA  +  sp{pT . pk}}. 

(iii) 

Every  pk  generated  by  the  algorithm  is  nonzero. 

(iv) 

rfp/=  0  , 

T  <  j  <  k 

(v) 

pjHpj  =  0  , 

1  i-j  1  <  ™ 

(vi) 

dim  sp{prt... 

(vii) 

pjHpj  =  pjf 

r-l,  T  <  3  <  k 

The  proof  of  Theorem  2  is  similar  to  that  of  Theorem  1,  and  is  omitted. 

The  linear  system  that  determines  xk  in  the  GCD(m)  algorithm  is  analogous  to  that  of  the 
GCD  algorithm.  At  each  step  of  the  GCD(m)  algorithm,  xk  is  chosen  to  satisfy 


argmin  g(x). 

*e{*^!+»p{pr  . .  •  ,p*_, .<<*}} 


Let  Pk  =  {pT,  ,  pk-i,dk],  x  =  xT_i  +  Pkc,  and 


?(c)  =  9(^*c  +*r-i)  =  \{pkC  +xT-i)TH{Pkc  +  xr.1)-hT[Pkc  +xT_1). 

Then  Vq(c)  =  PkH{Pkc  +  xT^l)-Pk'h.  Pk  has  full  column  rank,  so  that  PkHPk  is  positive 
definite,  and  xk  can  be  found  by  solving  the  linear  system  PkHPkc  =  Pk{h-HxT_x)  {=Pk7T_ i)  for 
c ,  and  setting  xk  =  xr_!  +  Pkc. 


The  solution  to  the  linear  system  is 
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c  = 


1  -  a* 


PrHdk 

vjHPr 


,  _  Pk-\Hdk 

^  ^  T  TT 

Pk-lHPk-l 


*-l 


dfrr-l-  S  d?HPi 


,  where  ak 


3=r 


**_£«*£ 


y=r  Pj  Hpj 


Again,  we  define  for  r<  /  <  k- 1 


Then 


and 


dk  rk-l _ 

4rtf(4  +  §  /5,WPy)  ’ 


**  =  *r-l  +  Pk  «  =  *r-l  +  £  Py  +  "*(4  +  £  $A)Py) 

y=r  y=r 

=  +  a*(<4  +  Yj  fflPj)  • 

3—T 


To  ensure  convergence  for  the  truncated  version  of  the  algorithm,  we  slightly  strengthen  the 
requirement  that  <4  n°t  he  orthogonal  to  the  gradient  of  q  at  x*_i,  asking  that  the  angle  between 
dk  and  7k_x  be  bounded  away  from  90°.  We  use  the  notation  n(H)  =  Xmix(//)/Xmin(ff)  for  the 
condition  number  of  a  symmetric  positive  definite  matrix  H . 

The  following  theorem  shows  a  q-linear  rate  of  convergence  for  the  entire  class  of  GCD(m) 
algorithms.  But  it  is  pessimistic  because  the  rate  constant  that  appears  is  a  lower  bound  on  the 
amount  of  reduction  that  would  be  achieved  with  a  GCD(O)  algorithm.  We  shall  remark  at  the 
end  of  the  section  on  the  relationship  between  this  bound  and  other  bounds  given  for  special 
cases. 


Theorem  3:  If  H  is  symmetric  positive  definite,  dkj^0  for  all  k,  and  there  exists  7  >  0  such  that 
I  I  >  7 1 1  dk  ||  2 1 1  r/t-i  1 1 2  f°r  then  the  sequence  of  iterates  {z*}  general, ed  by  a 

GCD(m)  algorithm  converges  to  a;*  and  satisfies 


17 


II  **-*' 


Proof: 


If  r*_ i  =  0  for  some  k ,  then  xk_x  =  x  *.  Assume  r*_!  ^  0  for  all  Jb .  Then 

?(**)-  1)  =  -^-(**-1  +  P*)r^(ar*-i  +  Pt)-  h  T{*k-i  +  Pk)~  j  *k-iHxk_x  +  h  Txk.1 

=  VkHxk- 1  +  y  P*rHp*  ~pA 

=  -P*’'r*-i+  jP*rri-i 

1  r_ 

=  -TPtrt-l, 

k 

?(**)  =  E 

/-i 

=  -lS  P/F;-i  <  °- 

z  y=i 


* 

Since  /f  is  positive  definite,  g(a:)  is  bounded  below,  so  we  conclude  that  lim  p/*j-i 

*_>0°  =1 

exists.  Hence,  lim  ptrr*_j  =  0,  and 
*-»  00 


F*-1P*  =  '■/-l  +  E  ^j*)Py)l 

=  «*»f- 14 


(F*V*)2 


y=r 


(rf/flp,)2 ' 

pJ'Hpj 


Since  all  the  terms  in  the  summation  are  nonnegative,  this  is 


> 


> 


(F*v*)2 


dkTHdk 

"T2 1 1  F*-i  1 1 1 1 1  III 


Wtf)IKII; 


72 


xraax(//) 


llr*-i 


>  o. 


We  have  0 


lim  rk_ipk  > 

k~+  oo 


T2 


lim  1 1  r*-i  1 1 1 

*  — ►OO 


>  0,  so  that  {rt}  converges  to  0.  Since  H 
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is  positive  definite,  this  is  necessary  and  sufficient  for  lim  xk  =  x* . 

k-+oo 

We  have 


and 


Now 


so  that 


and 


Thus, 


9  (**-i) -$(**)  =  jPtVi 

=  1 

^  S  n 


2Xm„(//) 


r*_! 


2 

2  ) 


II**-**  III  =  2[?(z*)-5(:r*)] 

=  2[9(*i)-9{x*_i)+g(a:fc_1)-?(a:*)] 

<  1 1  **-i  ~  2  *  1 1 1  “  T  ^7777  1 1  III- 

^maxl**  / 


F(ar)  —  h-Hx  =  H(x‘ -x), 


II*-**  III  =  r(x)r  H~lT(x) , 


l|r*-i||l  >  Xmin(i/)  ||**_1-*'|||. 


II**-*'IU<  ll**-l-*‘lll 


xma*(tf) 


=  ’■-afe)  n^-'ii*- 


l  **- 1  - 


2 


Convergence  of  ORTHOMIN(fc)  is  quickly  and  easily  verified.  For  ORTHOMIN(i), 
4  =  rk-u  and 

dk^k-i  =  rkLiATrk_ i  =  r^Ar^  =  r£j  (A±d_)  rt  , . 

With  the  assumption  that  the  symmetric  part  of  A  is  positive  definite,  our  hypothesis  is 


satisifed. 
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Let  M  = 


A  +At 
2 


.  Then 


<**V*-i  =  fk-iMrk_i  >  Xmin(Af)  1 1  r*_!  1 1 1 

~  imr  n  *^_i  ii2  ii  ii2‘ 

If  the  symmetric  part  of  A  is  negative  definite,  we  can  use  the  bound 

I  <*/ r*_ 1 1  =  |  rf_iMrk_i  |  >  - Xmax(M )  1 1  rk_x  ||f . 


If  the  symmetric  part  of  A  is  indefinite,  convergence  of  ORTHOMIN  (it)  is  not  guaranteed. 
The  algorithm  can  fail  to  produce  descent  at  some  step  in  that  case,  since  dk7k_x  =  rk_xMrk_x  can 
be  zero  without  rk_x  being  zero. 

The  bound  7  that  appears  in  Theorem  3  is  easily  obtained  for  CG,.  CGNE,  Craig’s  method, 
and  PCG.  For  CG,  CGNE,  and  Craig’s  method,  dk7k_x  =  ||i*  ||2||F*_i||2,  so  that  7=1.  We 
have  already  established  a  bound  for  ORTHOMIN  (k),  which  also  applies  to  CR,  GCR,  and 
ORTHOMIN.  Similarly,  for  PCG  one  can  obtain  7=  l//c(M_1). 

A  better  convergence  rate  constant  than  that  shown  in  Theorem  3  can  be  obtained  for  par¬ 
ticular  choices  of  H  and  dk.  From  the  proof  of  Theorem  3 


9  -?(**)  > 


2  d?Hdk  ’ 


which  gives 


||**-*'||2<  IK-i-z'll 

and  it  may  be  advantageous  to  use  this  rather  than  the  coarser  bound  involving  7  when  dk  and  H 
are  known.  For  example,  for  CR,  GCR,  ORTHOMIN,  and  ORTHOMIN(fc), 

H  =  Ar  A,  dk  =  rk_i ,  7k  =  Arrk,  and  \\xk-x‘\\%  =  \  \  rk  \  |  f , 

so  that 
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(4K-i)2 

d?Hdk 


rk-\A  TArk_l 


X2in(M) 

lm„(A  T  A ) 


1 1  *•*_!  III.  where  M  = 


A  +At 


Kill  —  1-V 


Mta) 


IK-i  IN 


Elman  (1982)  derives  this  q-linear  error  bound  on  {  |  K  1 1 2}  on  Pa8e  49  in  the  proof  of  his 
Theorem  5.9.  However,  he  weakens  it  in  the  statement  of  the  theorem  to  the  r-linear  error  bound 

Hr  II 2  fl  |  ||  ||  2 

lir,"2£  l1~wIrT)l  "  olu-  ■ 


For  PCG, 


H  =  A  ,  dk  =  M  K-!  >  where  M  —  C2  is  the  preconditioner,  rk  =  rk  , 


KK-i)2  {rf-iM  ‘r^H2 


[((7-1rt_i)r(C_1r*_1)]2 


dkTHdk  rk_iM~lAM~x r*_!  (G-1ri_1)7’(<r1J4CT1)((7-1r*.1)  ~  \mJ,C-'AC~') 

Xmj^C1  ^4.(7  *)  .  ..  2  1 1  xk- 1  ~ *  ||^ 

~  Xmax(C'-1AG-1)  11  k(C7-1AC'-1)  ’ 


This  q-linear  error  bound  implies  the  weaker  r-linear  error  bound 


K-*'ll*<  1-777^ I 


KiC-'ACT1) 


I ar0— ar*  \\\  . 


The  Chebychev  polynomial  approach  yields  only  an  r-linear  error  bound,  but  it  is  much  better 
than  the  r-linear  error  bound  above,  since  k{C~1  AG~1)2  appears  in  place  of  k{C~1AC~x)  . 


Conjugate  Directions  in  Quadratic  Programming 

The  formulation  of  the  generalized  conjugate  directions  algorithm  that  has  been  presented 
for  solving  systems  of  linear  equations  can  be  readily  adapted  to  give  a  conjugate  directions  algo¬ 


rithm  for  solving  the  equality  constrained  quadratic  programming  problem 
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minimize  q(x)  =  ^- xT Hx -hT x 
subject  to  Ax  =  0 , 

where  we  assume  that  H  is  symmetric  positive  definite  on  the  nullspace  of  A ,  and  that  A  has  full 
row  rank.  Given  an  initial  feasible  point,  a  problem  with  inhomogeneous  constraints  can  readily 
be  put  in  this  form. 

Starting  with  the  initial  feasible  point  x0  —  0,  the  algorithm  will  maintain  feasibility  at  each 
step,  and  will  produce  a  collection  of  linearly  independent,  mutually  //-conjugate  direction  vec¬ 
tors  {p,}  as  in  the  GCD  algorithm.  Here,  the  direction  vectors  will  all  be  in  the  nullspace  of  A, 
and  the  algorithm  will  terminate  after  no  more  than  a  steps,  where  a  is  the  dimension  of  the 
nullspace  of  A.  The  case  where  A  is  the  row  of  all  ones  arises  in  some  problems  from  conserva¬ 
tion  laws. 

Let  P  denote  the  projector  onto  the  nullspace  of  A:  P  =  I -AT{AAT)~lA.  To  solve  the 
quadratic  programming  problem,  the  GCD  algorithm  is  modified  so  that  the  termination  criterion 
is  that  the  projected  gradient,  Pfk-u  is  zero,  and  at  each  step  we 

choose  dk  so  that  dk  is  not  orthogonal  to  the  projected  gradient  of  q  at  the  current  point: 
d?Prk- i^O; 

let  dk  =  Pdk ; 

minimize  q (a:)  over  ap  {p  j, ...,  pk_ i,  dk}  . 

Since  P  is  symmetric,  dkPrk_l  =  dk¥k_l,  so  that  the  modified  algorithm  operates  in  exactly 
the  same  way  as  the  GCD  algorithm,  and  Theorem  1,  with  n  in  part  (i)  changed  to  a,  applies  to 
the  quadratic  programming  conjugate  directions  algorithm. 
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